Explore Efficient Data Organization for Large Scale Graph Analytics and Storage

نویسندگان

  • Yinglong Xia
  • Ilie Gabriel Tanase
  • Lifeng Nai
  • Wei Tan
  • Yanbin Liu
  • Jason Crawford
  • Ching-Yung Lin
چکیده

Many Big Data analytics essentially explore the relationship among interconnected entities, which are naturally represented as graphs. However, due to the irregular data access patterns in the graph computations, it remains a fundamental challenge to deliver highly efficient solutions for large scale graph analytics. Such inefficiency restricts the utilization of many graph algorithms in Big Data scenarios. To address the performance issues in large scale graph analytics, we develop a graph processing system called System G, which explores efficient graph data organization for parallel computing architectures. We discuss various graph data organizations and their impact on data locality during graph traversals, which results in various cache performance behavior on processor side. In addition, we analyze data parallelism from architecture’s perspective and experimentally show the efficiency for System G based graph analytics. We present experimental results for commodity multicore clusters and IBM PERCS supercomputers to illustrate the performance of System G for large scale graph analytics. Keywords-Graph Processing, Parallel Computing, Scalability

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing

Existing distributed graph processing frameworks, e.g., Pregel, Giraph, GPS and GraphLab, mainly exploit main memory to support flexible graph operations for efficiency. Due to the complexity of graph analytics, huge memory space is required especially for those graph analytics that spawn large intermediate results. Existing frameworks may terminate abnormally or degrade performance seriously w...

متن کامل

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

Storing and Analyzing Historical Graph Data at Scale

The work on large-scale graph analytics to date has largely focused on the study of static properties of graph snapshots. However, a static view of interactions between entities is often an oversimplification of several complex phenomena like the spread of epidemics, information diffusion, formation of online communities, and so on. Being able to find temporal interaction patterns, visualize th...

متن کامل

Analyzing Complex Data in Motion at Scale with Temporal Graphs

Modern analytics solutions succeed to understand and predict phenomenons in a large diversity of software systems, from social networks to Internet-of-Things platforms. This success challenges analytics algorithms to deal with more and more complex data, which can be structured as graphs and evolve over time. However, the underlying data storage systems that support large-scale data analytics, ...

متن کامل

Geovisual Analytics and Storytelling Applied to a Flood Scenario

The large and ever-increasing amounts of multi-dimensional, multi-source, time-varying and geospatial digital information represent a major challenge for the analyst. The need to analyse and make decisions based on these information streams, often in time-critical situations, demands efficient, integrated and interactive tools that aid the user to explore, present, collaborate and communicate v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014